Expand description
This crate implements Rerun’s code generation tools.
These tools translate language-agnostic IDL definitions (flatbuffers) into code.
They are invoked by pixi run codegen
.
§Organization
The code generation process happens in 4 phases.
§1. Generate binary reflection data from flatbuffers definitions.
All this does is invoke the flatbuffers compiler (flatc
) with the right flags in order to
generate the binary dumps.
Look for compile_binary_schemas
in the code.
§2. Run the semantic pass.
The semantic pass transforms the low-level raw reflection data generated by the first phase into higher level objects that are much easier to inspect/manipulate and overall friendlier to work with.
Look for objects.rs
.
§3. Fill the Arrow registry.
The Arrow registry keeps track of all type definitions and maps them to Arrow datatypes.
Look for arrow_registry.rs
.
§4. Run the actual codegen pass for a given language.
We currently have two different codegen passes implemented at the moment: Python & Rust.
Codegen passes use the semantic objects from phase two and the registry from phase three in order to generate user-facing code for Rerun’s SDKs.
These passes are intentionally implemented using a very low-tech no-frills approach (stitch
strings together, make liberal use of unimplemented
, etc) that keep them flexible in the
face of ever changing needs in the generated code.
Look for codegen/python.rs
and codegen/rust.rs
.
§Error handling
Keep in mind: this is all build-time code that will never see the light of runtime. There is therefore no need for fancy error handling in this crate: all errors are fatal to the build anyway.
Make sure to crash as soon as possible when something goes wrong and to attach all the
appropriate/available context using anyhow
’s with_context
(e.g. always include the
fully-qualified name of the faulty type/field) and you’re good to go.
§Testing
Same comment as with error handling: this code becomes irrelevant at runtime, and so testing it brings very little value.
Make sure to test the behavior of its output though: re_types
!
§Understanding the subtleties of affixes
So-called “affixes” are effects applied to objects defined with the Rerun IDL and that affect the way these objects behave and interoperate with each other (so, yes, monads. shhh.).
There are 3 distinct and very common affixes used when working with Rerun’s IDL: transparency, nullability and plurality.
Broadly, we can describe these affixes as follows:
- Transparency allows for bypassing a single layer of typing (e.g. to “extract” a field out of a struct).
- Nullability specifies whether a piece of data is allowed to be left unspecified at runtime.
- Plurality specifies whether a piece of data is actually a collection of that same type.
We say “broadly” here because the way these affixes ultimately affect objects in practice will actually depend on the kind of object that they are applied to, of which there are 3: archetypes, components and datatypes.
Not only that, but objects defined in Rerun’s IDL are materialized into 3 distinct environments: IDL definitions, Arrow datatypes and native code (e.g. Rust & Python).
These environment have vastly different characteristics, quirks, pitfalls and limitations, which once again lead to these affixes having different, sometimes surprising behavior depending on the environment we’re interested in. Also keep in mind that Flatbuffers and native code are generally designed around arrays of structures, while Arrow is all about structures of arrays!
All in all, these interactions between affixes, object kinds and environments lead to a combinatorial explosion of edge cases that can be very confusing when it comes to (de)serialization code, and even API design.
When in doubt, check out the rerun.testing.archetypes.AffixFuzzer
IDL definitions, generated code and
test suites for definitive answers.
Re-exports§
Modules§
Structs§
- Computes and maintains a registry of
arrow2::datatypes::DataType
s for specified flatbuffers definitions. - A collection of arbitrary attributes.
- A high-level representation of the contents of a flatbuffer docstring.
- A yet-to-be-resolved
arrow2::datatypes::Field
. - A high-level representation of a flatbuffers object, which can be either a struct, a union or an enum.
- A high-level representation of a flatbuffers field, which can be either a struct member or a union value.
- The result of the semantic pass: an intermediate representation of all available object types; including structs, enums and unions.
Enums§
- The underlying element type for arrays/vectors/maps.
- A yet-to-be-resolved
arrow2::datatypes::DataType
. - Is this a struct, enum, or union?
- The kind of the object, as determined by its package root (e.g.
rerun.components
). - The underlying type of an
ObjectField
.
Constants§
Traits§
- Implements the formatting pass.
- Implements the codegen pass.
Functions§
- Compiles binary reflection dumps from flatbuffers definitions.
- This will automatically emit a
rerun-if-changed
clause for all the files that were hashed. - Also triggers a re-build if anything that affects the hash changes.
- Generates C++ code.
- Generate flatbuffers definition files.
- Handles the first 3 language-agnostic passes of the codegen pipeline:
- Generates Python code.
- Generates Rust code.
- Verifies that a buffer of bytes contains a
Schema
and returns it. Note that verification is still experimental and may not catch every error, or be maximally performant. For the previous, unchecked, behavior useroot_as_schema_unchecked
.
Type Aliases§
- In-memory generated files.